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METHOD OF GENERATING AN EXCEPTIONAL PRONUNCIATION 
DICTIONARY FOR AUTOMATIC KOREAN PRONUNCIATION GENERATOR 

CROSS REFERENCE TO RELATED APPLICATION DATA 
5 This application claims the benefit of PCT Application No. PCT/KR2003//001 1 87, 

as filed on 17 June 2003 and incorporated herein by reference. 

BACKGROUND OF THE INVENTION 

1 . Field of the Invention 

10 The present invention relates to a method of generating an exceptional pronunciation 

dictionary for automatic Korean pronunciation generator in a Text-to-Speech system or in an 
automatic speech recognition system. 

2 . Description of the Related Art 

Conventionally, a method for automatic Korean pronunciation generator as shown in 
15 FIG. 1 comprises the steps of analyzing and pre-processing inputted text; analyzing 
morphemes of the text; tagging POS (part of speech); and generating pronunciations based on 
an exceptional pronunciation dictionary and a part of regular rules for changing phonemes. 
The automatic Korean pronunciation generator is characterized by two parts: the dictionary of 
exceptional words and the part of regular rules for changing phonemes. The exceptional 
2 0 words have been recorded in the dictionary for exceptional words in a simple and random 
manner, whereas the researches on the regular rules for changing phonemes have been 
actively progressed. 

One example of regular rules is the Fortition of lenis consonant 1 , e.g., a Korean word 
'^■tij(kikpi)' is pronounced as [^-^l(klkbi)]. Thus, it is the Fortition rule that the Korean 
2 5 letter c ta (p) 5 after fi ~i(k)' i s pronounced as [>ti(b)]. The Fortition rule actually includes that 
c i=(t), ~~i(k), -Ms), ^(c)' as well as c tJ (p)' after c ~i(k)' are respectively pronounced as 
[01(d), Ti(g), M(S), ^(z)]. When a Korean obstruent letter, c w(p), i=(t), ~i(k), ^(s), or 
* (c)' of a Korean word is positioned after another Korean obstruent letter, the ' ta (p), t= (t), 
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-i(k), >Ms), are respectively pronounced as tc(d), "n(g), m(S), **(z)]. 

This Fortition Rule has no exceptions in a given environment. 

On the contrary, alternative pronunciations can be observed in a certain context, in 
which the choice of the pronunciation depends on the words (idiosyncratic). And it is 
5 impossible to make rules for these words, which should be classified as words for the 
Exceptional Pronunciation Dictionary in TTS or ASR. For example, '# J17l[mulkoki]' and 
c -i:JI7l[pulkoki]' are respectively realized as [#2£7|][mulgoki] and [ii:aL7]][pulkoki]. In 
**t5L7] [bulkoki]', a letter ' n [k]' located after a letter 6 s [l]' i s pronounced as [n ][k], while 
in fi #J17l[mulkoki]', a letter '~i[k]' located after a letter is pronounced as [n][g]. 

10 The Fortition in [#3I7]][mulgoki] is an exceptional case, which is not predictable, and 
needs to be recorded as an entry of the Exceptional Pronunciation Dictionary. 

A generating process of the exceptional pronunciations in Korean has been known as 
a challenging task to be solved in the TTS system and the speech recognition system in 
Korean, but very little research has been conducted on this matter, for which, the 

15 characteristics of words having the exceptional pronunciations need to be dealt with in 
advance. 

DISCLOSURE OF INVENTION 
Therefore, it is an object of the present invention to provide a method for generating 
2 0 an exceptional pronunciation dictionary for automatic Korean pronunciation generator by 
reviewing the words which have exceptional pronunciations from text corpus based on the 
characteristics of the words of exceptional pronunciations through phonological research and 
text analysis of Korean language. 
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BRIEF DESCRIPTION OF DRAWINGS 
This invention will be better understood and its various objects and advantages will 
be fully appreciated from the following descriptions taken in conjunction with the 
5 accompanying drawings, in which: 

FIG. 1 shows a block diagram of an automatic pronunciation generator; 
FIG. 2 indicates a method for compiling an exceptional pronunciation dictionary 1 
using a general dictionary; and 

FIG. 3 indicates a method for compiling a new exceptional pronunciation dictionary 
10 2 using text corpus. 



BEST MODE FOR CARRYING OUT THE INVENTION 
This invention is comprised of the steps of (1) setting exceptional sound conditions; 
(2) compiling an exceptional pronunciation dictionary using general dictionaries; and (3) 
15 compiling the exceptional pronunciation dictionary using text corpus. 

The step of setting exceptional pronunciation conditions establishes the phoneme 
conditions where the exceptional pronunciations are observed based on the systematic 
research through the Korean phonology and the text analysis. Although it has been thought 
that the phoneme conditions of exceptional pronunciations cannot be explained with any rules, 
2 0 the disclosed shows its regularity based on thorough researches. Accordingly, the words 
showing exceptional pronunciations in Korean are observed in certain limited conditions. 

The step of generating the exceptional pronunciation dictionary includes the 
following two steps. 

The first step is to generate an exceptional pronunciation dictionary by analyzing 
2 5 words having the exceptional pronunciations in a general Korean dictionary. By using a 
general Korean dictionary, the repetition of vocabulary can be minimized and also different 
kinds of vocabulary can be included in the exceptional pronunciation dictionary. The general 
Korean dictionary used as an analyzing object in this research is the YEONSEI KOREAN 
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DICTIONARY (YKD henceforth), which has a record of about 50,000 entry words of high 
frequency. To generate an exceptional pronunciation dictionary, the exceptional condition 
reference dictionary which includes the words appearing in the exceptional pronunciation 
conditions needs to be established using YKD. The exceptional pronunciation dictionary is to 
5 be generated by manual review of the words listed in the exceptional condition reference 
dictionary. 

However, vocabularies excluded in the general dictionary are also used in actual 
economic and social life. Furthermore, a number of vocabularies are being coined in variable 
conditions of life, such as the new words observed in the texts of newspapers or broadcasts, 
10 which should be extracted and listed in the exceptional pronunciation dictionary. 

(1) Setting exceptional pronunciation conditions 



The exceptional pronunciation conditions mean phonological conditions in which 
15 the exceptional pronunciations are observed. 

Accordingly, a research was preceded for systematic phonological conditions based 
on the characteristics of the words of exceptional pronunciations through text analysis. 

The words which have exceptional pronunciations are nouns and their derivatives, 
which are declinable parts of speech in Korean. 
2 0 In the following description, phonological conditions are disclosed where the 

exceptional pronunciations are observed. 

Generally, phonological conditions include 4 different cases: the first case is when a 
vowel follows a consonant; the second, when a consonant follows a preceding consonant; the 
third, when a vowel follows a vowel, and the fourth is when a vowel follows a consonant. 
25 Among the above 4 cases, the phonological conditions for the exceptional 

pronunciations are the second case, when a consonant follows another preceding consonant, 
and the fourth case, when a vowel follows a consonant. When a consonant follows another 
preceding consonant, the preceding consonant is a voiced sound such as "^[m], *-[n], 
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0[o], and the following consonant is a lenis sound. In this context, there are no 

regular phoneme rules that can be applied, but the words with lenis sound are pronounced as 
fortis depending on words. An example is already shown above, 'ir J17][mulkoki]' and 
'-ira^ltpulkoki]' are respectively realized as [^3L7]][mulgoki] and [-irJ17]][pulkoki]. In 
5 '-S- 317} [bulkoki]', a letter ' i [k]' located after a letter 's[l] 5 is pronounced as [ n ][k], while 
in 6 #J17][mulkoki]', a letter c ~i[k]' located after a letter c e[l]' is pronounced as [~n][g]. 
These words, which have different pronunciations in the same phoneme context, are 
exceptional pronunciation words and eventually recorded in the exceptional pronunciation 
dictionary. 

10 When a vowel follows a consonant, there can be observed two cases detailed as 

follows. In one case, when the consonant is "A [s]" the "^[s]" is respectively pronounced as 
" t- [n]" and [t]" in the same condition, for example, 35 H [a-lEn-ni]" and & °] [tvt- 
vp-si]". In the other case, a letter "^-[n]" is inserted between the consonant and the vowel. 
For example, "^[aP^tH]" is pronounced as [^"d, am-nil]. 

15 In this invention, the conditions of the exceptional pronunciations are arranged based 

on the analytical research of YKD. 

The following table 1 shows the conditions in which the exceptional pronunciations 
are observed, and the table 2 shows examples for each condition. 

2 0 [Table 1 ] Exceptional pronunciation conditions 





«[p] 


*=[t] 










o[N] 


V(i/y) 


o[m] 












Mn] 








o[N] 








= [1] 








C 








II 


1 






Ms] 



















(C: Consonant, V: Vowel) 
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[Table 2] Examples of exceptional pronunciations 





«[P] 






*[c] 


1[k] 




o[N] 


V(i/y) 


L m J 


d 1 

[bom-bi] 


[bo-rlm- 
dal] 


[sum-So- 
li] 


[hlm-zip] 


[gum-gil] 








[n] 


VL o 

[nun- 
byvN] 


[non-duk] 


[nun-Sal] 


[kwan- 
zvm] 


[nun-ga] 


[bvm- 
sin-non 






[N] 


o e 

[dEN- 
>wtl 


O 1 

[caN-dok] 


o -i 

[daN- 
Sok] 


o *n 

[doN- 
zvk] 


o s 

[daN-gul] 










[dll-bol 

L J 


[kal-dE] 


[kyvl- 
San] 


[kyvl-zE] 


S3 
[dll-gvt] 

L O J 








C 
















[am-nil 


A[ S ] 














[ut-ot 





(2) Compiling an exceptional pronunciation dictionary using an general dictionary 

5 (YKD) 



A reference dictionary 1 is compiled by extracting the words (using the Table 1) in 
the exceptional conditions from the entries of a general dictionary which includes basic 
words of the Korean language. 
10 A researcher manually reviews words of the reference dictionary 1 in the exceptional 

conditions and edits an exceptional pronunciation dictionary 1 by collecting words which 
show exceptional pronunciations. 
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(3) Compiling an exceptional pronunciation dictionary based on text corpus 



The text corpus are basically an assemblage of sentences, which are to be analyzed, 
pre-processed, and divided into Eojols (units surrounded by space). Then the Eojols in the 
5 exceptional conditions will form the vocabulary dictionary 1 in the exceptional conditions. 

Next, the vocabulary dictionary 1 in the exceptional conditions are compared with 
the words included in the reference dictionary 1 in the exceptional conditions generated in the 
previous step. As a result of the comparison, the vocabulary dictionary 2 in the exceptional 
condition is to be generated, after removing repeated words. 
10 The exceptional pronunciation dictionary 2 is compiled by extracting additional 

words having exceptional pronunciations through manual review of the vocabulary dictionary 
2 in the exceptional condition. 

The new reference dictionary 2 in the exceptional conditions is created by editing 
the vocabulary dictionary 2 in the exceptional condition and the reference dictionary 1 in the 
15 exceptional condition. However, when an exceptional pronunciation dictionary is edited from 
a new text corpora, the new reference dictionary 2 for the exceptional condition will be used 
as the reference dictionary. 

Thus, the method contributes to the performance improvement of automatic 
pronunciation generator in Korean as well as the performance improvement of speech 
2 0 recognition system and TTS system in Korean. 
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